Voice signal encoding and decoding method, device, and codec system

ABSTRACT

A voice signal encoding and decoding method, device, and codec system are provided. The coding method includes: encoding an input voice signal to obtain a broadband code stream, where the broadband code stream includes a core layer bit stream and an extension enhancement layer bit stream ( 101 ); compressing the core layer bit stream to obtain a compressed code stream ( 102 ); and packing the compressed code stream and the extension enhancement layer bit stream to obtain a packed code stream ( 103 ). The core layer bit stream compressed, and the compressed code stream and the extension enhancement layer bit stream are packed, thereby reducing transmission bandwidth occupied by the input voice signal. Since the broadband voice encoding is performed on the input voice signal, a broadband voice code stream is transmitted by using narrowband transmission bandwidth, thereby improving the cost performance of voice signal transmission.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation of U.S. patent application Ser. No.13/632,905, filed on Oct. 1, 2012, which claims priority toInternational Application No. PCT/CN2011/072570, filed on Apr. 9, 2011,which claims priority to Chinese Patent Application No. 201010147586.6,filed on Apr. 9, 2010, all of which are hereby incorporated by referencein their entireties.

TECHNICAL FIELD

Embodiments of the present invention relate to the field of audioprocessing technologies, and in particular, to a voice signal encodingand decoding method, device, and codec system.

BACKGROUND

In real-time voice network transmission, in order to improve voicequality, a real-time voice transmission system needs to select asuitable voice compression algorithm and a transmission method accordingto indexes, such as an actual application network capability andtransmission bandwidth, delay, complexity, and voice quality, so as toimprove the cost performance of voice signal transmission as much aspossible.

In numerous voice codecs, G.711 is widely applied in an actualapplication due to advantages, such as a simple algorithm, strongrobust, and short delay. The international telecommunication uniontelecommunication standardization sector (ITU TelecommunicationStandardization Sector, ITU-T) proposes two extended voice encoderstandards, G.711.1 and G.711.0, based on the narrowband voice encoderG.711. In G.711.1, broadband extension is performed based on the G.711narrowband to implement broadband voice quality. In G.711.0, losslesscompression is performed on a G.711 code stream to reduce transmissionbandwidth by about 50%, thereby improving transmission quality of avoice signal during network congestion.

During implementation of the present invention, the inventor finds that,G.711.1 and G.711.0 in the prior art cannot reduce the transmissionbandwidth and improve the voice quality at the same time.

SUMMARY

Embodiments of the present invention provide a voice signal encoding anddecoding method, device, and codec system, to improve the costperformance of voice signal transmission.

An embodiment of the present invention provides a voice signal encodingmethod, where the method includes:

encoding an input voice signal to obtain a broadband code stream, wherethe broadband code stream includes a core layer bit stream and anextension enhancement layer bit stream;

compressing the core layer bit stream to obtain a compressed codestream; and

packing the compressed code stream and the extension enhancement layerbit stream to obtain a packed code stream.

An embodiment of the present invention provides a voice signal encodingdevice, where the device includes:

a first processing module, configured to encode an input voice signal toobtain a broadband code stream, where the broadband code stream includesa core layer bit stream and an extension enhancement layer bit;

a second processing module, configured to compress the core layer bitstream to obtain a compressed code stream; and

a third processing module, configured to pack the compressed code streamand the extension enhancement layer bit stream to obtain a packed codestream.

An embodiment of the present invention provides a voice signal decodingmethod, where the method includes:

acquiring header information in a packed code stream;

unpacking the packed code stream according to the header information, toobtain an extension enhancement layer bit stream and a compressed corelayer bit stream;

decompressing the compressed core layer bit stream to obtain adecompressed code stream; and

performing decoding reestablishment on the extension enhancement layerbit stream and the decompressed code stream, to obtain a broadbandreestablished voice signal.

An embodiment of the present invention provides a voice signal decodingdevice, where the device includes:

an acquisition module, configured to acquire header information in apacked code stream;

an unpacking module, configured to unpack the packed code streamaccording to the header information, to obtain an extension enhancementlayer bit stream and a compressed core layer bit stream;

a decompression module, configured to decompress the compressed corelayer bit stream to obtain a decompressed code stream; and

a reestablishment module, configured to perform decoding reestablishmenton the extension enhancement layer bit stream and the decompressed codestream, to obtain a broadband reestablished voice signal.

An embodiment of the present invention provides a voice signal codecsystem, including a voice signal encoding device and a voice signaldecoding device, where

the voice signal encoding device is configured to: encode an input voicesignal to obtain a broadband code stream, where the broadband codestream includes a core layer bit stream and an extension enhancementlayer bit stream; compress the core layer bit stream to obtain acompressed code stream; pack the compressed code stream and theextension enhancement layer bit stream to obtain a packed code stream;and send the packed code stream to the voice signal decoding device; and

the voice signal decoding device is configured to: acquire headerinformation from the packed code stream sent by the voice signalencoding device; unpack the packed code stream according to the headerinformation to obtain the extension enhancement layer bit stream and thecompressed core layer bit stream; decompress the compressed core layerbit stream, to obtain a decompressed code stream; perform decodingreestablishment on the extension enhancement layer bit stream and thedecompressed code stream, to obtain a broadband reestablished voicesignal.

In the voice signal encoding and decoding method, device, and codecsystem provided by the embodiments of the present invention, the corelayer bit stream is compressed, and the compressed code stream and theextension enhancement layer bit stream are packed, thereby reducingtransmission bandwidth occupied by the input voice signal. Since thebroadband voice encoding is performed on the input voice signal, abroadband voice code stream is transmitted by using narrowbandtransmission bandwidth, thereby improving the cost performance of voicesignal transmission.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a schematic flow chart of an embodiment of a voice signalencoding method of the present invention;

FIG. 2 is a schematic flow chart of another embodiment of a voice signalencoding method of the present invention;

FIG. 3 is a schematic structural diagram of an embodiment of a voicesignal encoding device of the present invention;

FIG. 4 is a schematic structural diagram of another embodiment of avoice signal encoding device of the present invention;

FIG. 5 is a schematic flow chart of an embodiment of a voice signaldecoding method of the present invention;

FIG. 6 is a schematic flow chart of another embodiment of a voice signaldecoding method of the present invention;

FIG. 7 is a schematic structural diagram of an embodiment of a voicesignal decoding device of the present invention;

FIG. 8 is a schematic structural diagram of another embodiment of avoice signal decoding device of the present invention;

FIG. 9 is a schematic structural diagram of an embodiment of a voicecodec system of the present invention;

FIG. 10 is a schematic structural diagram of a system applicable in anembodiment of the present invention;

FIG. 11 is a schematic diagram of a code stream formed at an encodingend in the embodiment shown in FIG. 10;

FIG. 12 is a schematic diagram of a code stream formed at a decoding endin the embodiment shown in FIG. 10;

FIG. 13 is another schematic diagram of a code stream formed at anencoding end in the embodiment shown in FIG. 10;

FIG. 14 is another schematic diagram of a code stream formed at adecoding end in the embodiment shown in FIG. 10;

FIG. 15 is yet another schematic diagram of a code stream formed at anencoding end in the embodiment shown in FIG. 10; and

FIG. 16 is yet another schematic diagram of a code stream formed at adecoding end in the embodiment shown in FIG. 10.

DETAILED DESCRIPTION

The technical solutions according to embodiments of the presentinvention will be clearly and completely described below with referenceto the accompanying drawings in the embodiments of the presentinvention. It is obvious that the embodiments to be described are only apart rather than all of the embodiments of the present invention. Allother embodiments obtained by persons skilled in the art based on theembodiments of the present invention without creative efforts shall fallwithin the protection scope of the present invention.

In the embodiments of the present invention, if a sampling rate of avoice signal is 8 kHz, the voice signal is a narrowband signal; if asampling rate of a voice signal is higher than 8 kHz, the voice signalis a broadband signal. Moreover, the narrowband signal and the broadbandsignal are relative concepts and are not intended to limit theembodiments of the present invention according to the sampling rate of 8kHz.

FIG. 1 is a schematic flow chart of an embodiment of a voice signalencoding method of the present invention. As shown in FIG. 1, theembodiment of the present invention includes the following steps.

Step 101: Encode an input voice signal to obtain a broadband codestream, where the broadband voice code stream includes a core layer bitstream and an extension enhancement layer bit stream.

Step 102: Compress the core layer bit stream to obtain a compressed codestream.

Step 103: Pack the compressed code stream and the extension enhancementlayer bit stream to obtain a packed code stream.

In the voice signal encoding method provided by the embodiment of thepresent invention, the core layer bit stream is compressed, and thecompressed code stream and the extension enhancement layer bit streamare packed, thereby reducing transmission bandwidth occupied by theinput voice signal. Since broadband voice encoding is performed on theinput voice signal, a broadband voice code stream is transmitted byusing narrowband transmission bandwidth, thereby improving the costperformance of voice signal transmission.

FIG. 2 is a schematic flow chart of another embodiment of a voice signalencoding method of the present invention. As shown in FIG. 2, theembodiment of the present invention includes the following steps.

Step 201: Encode an input voice signal to obtain a broadband codestream, where the broadband voice code stream includes a core layer bitstream and an extension enhancement layer bit stream.

The core layer bit stream may be specifically a narrowband voice codestream. The narrowband voice code stream is obtained by encoding anarrowband signal. The extension enhancement layer bit stream mayspecifically include a narrowband enhancement bit stream and/or abroadband enhancement bit stream. The narrowband enhancement bit streamis specifically an enhancement bit stream with narrowband voice encodingquality. The broadband enhancement bit stream is specifically anenhancement bit stream with broadband voice encoding quality.Specifically, if the input voice signal is encoded by using a G.711.1encoder to obtain a broadband code stream, the core layer bit stream isspecifically a G.711 bit stream (bits) and the extension enhancementlayer bit stream is a G.711.1 extension bit stream (ext bits).

Step 202: Combine core layer bit streams in at least two data frames toobtain a data packet corresponding to a combined core layer bit stream.

Specifically, if the input voice signal is encoded through the G.711.1encoder to obtain the broadband code stream in step 201, since a corelayer code stream is specifically a G.711 bit stream (bits), step 202 isspecifically: combine G.711 bit streams (bits) in at least two frames toobtain a data packet of more than at least two data frames formedthrough combination.

Step 203: Determine frame length information during lossless compressionperformed on the data packet.

Specifically, the frame length information during the losslesscompression performed on the data packet may be determined in thefollowing three manners.

In a first manner, if a packet length of the data packet to be processedis less than or equal to a longest frame length during the losslesscompression, the determining the frame length information during thelossless compression performed on the data packet is: if the packetlength of the data packet is equal to an available frame length duringthe lossless compression, determining that a frame length during thelossless compression is the packet length of the data packet; if thepacket length of the data packet is not equal to an available framelength during the lossless compression, determining that the frameduring the lossless compression is a longest available frame length lessthan the packet length of the data packet to be processed.

If a packet length of the data packet to be processed is greater than alongest frame length during the lossless compression, the determiningthe frame length information during the lossless compression performedon the data packet is: determining that a frame length processedcurrently is the longest frame length during the lossless compression;or determining that a frame length processed currently is a secondlongest frame length corresponding to the longest frame length duringthe lossless compression.

In a second manner, a frame length of a first frame during the losslesscompression is determined. If the packet length of the data packet is anintegral multiple of the frame length of the first frame, it isdetermined that the frame length of the remaining frames during thelossless compression is the frame length of the first frame.

When the packet length of the data packet is not an integral multiple ofthe frame length of the first frame, if a packet length of the datapacket to be processed is greater than or equal to the frame length ofthe first frame, a frame length processed currently is equal to theframe length of the first frame; if a packet length of the data packetto be processed is less than the frame length of the first frame, aframe length processed currently is a longest available frame lengthless than the packet length of the data packet to be processed.

In a third manner, the frame length information during the losslesscompression performed on the data packet is determined by combining theforegoing two manners. Definitely, the foregoing three manners arespecifically described as examples only and are not intended to limitthe manner of determining the frame length information during thelossless compression in the embodiment of the present invention.

Step 204: Perform the lossless compression on the combined core layerbit stream to obtain a compressed code stream.

In step 204, if the lossless compression is performed on the combinedcore layer bit stream by using a G.711.0 encoder to obtain thecompressed code stream, the compressed code stream is specifically aG.711.0 bit stream (bits).

Moreover, in step 202 to step 204, in the case in which core layer bitstreams of multiple data frames are packed into a data packet, framelength information during lossless compression performed on the datapacket can be flexibly determined according to a type of a voicetransmission network and/or a type of the input voice signal.

Step 205: Recombine the compressed code stream and the extensionenhancement layer bit stream to form a recombined code stream.

Specifically, extension enhancement layer bit streams of all frames arerecombined, and the extension enhancement layer bit stream after therecombination is set behind the compressed code stream to form arecombined code stream.

Step 206: Add header information including side information into therecombined code stream, to obtain a packed code stream.

In an actual application, the side information may include packet headerinformation in the real-time transport protocol (Real-time TransportProtocol, RTP), or may include payload header information (PayloadHeader) in the RTP. The payload header information may be encoding modeinformation of the G.711.1. Moreover, the side information may alsoinclude information that can be used for calculating the packet lengthpl of the data packet, the number N of frames during the losslesscompression performed by a lossless compressor, and a frame length flduring the lossless compression.

In the voice signal encoding method provided by the embodiment of thepresent invention, the core layer bit stream is compressed, and thecompressed code stream and the extension enhancement layer bit streamare packed, thereby reducing transmission bandwidth occupied by theinput voice signal. Since the broadband voice encoding is performed onthe input voice signal, a broadband voice code is transmitted by usingnarrowband transmission bandwidth, thereby improving the costperformance of voice signal transmission.

In order to understand the technical solution of the embodiment shown inFIG. 2 more clearly, the technical solution of the embodiment shown inFIG. 2 is exemplified below through an example that broadband encodingis performed through the G.711.1 encoder and the lossless compression isperformed through the G.711.0 encoder.

In step 202, if the packet length of the data packet corresponding tothe combined core layer bit stream is pl and the frame length during thelossless compression is fl, all combinations for implementing that a sumof frame lengths of all data frames in the data packet is equal to thepacket length pl may be combination manners in this embodiment, that is,all combinations of fl_(n) which satisfy

${{pl} = {\sum\limits_{n = 1}^{N}{fl}_{n}}},$

where N is the number of frames that can be processed in a data packetand varies with the selection of different combinations of fl_(n). Forexample, during the lossless compression, the G.711.0 encoder is adoptedto perform combination on a 35 ms data packet to make a packet length aspl, and the combination manners are as follows:

pl = fl₁ + fl₂, where  fl₁ = 30, fl₂ = 5, N = 2; orpl = fl₁ + fl₂ + fl₃, where  fl₁ = 20, fl₂ = 10, fl₃ = 5, N = 3; or …pl = fl₁ + … + fl₇, wherefl₁ = 5, fl₂ = 5, fl₃ = 5, fl₄ = 5, fl₅ = 5, fl₆ = 5, fl₇ = 5, N = 7.

Furthermore, based on the foregoing allowable combination manners, theheader information needs to be adaptively modified with differentcombination manners, and may also include information that can be usedfor calculating the packet length pl and the number N of frames, so thata receiving end performs corresponding decoding processing according tothe side information.

In step 203, if the G.711.0 encoder is adopted to compress the corelayer bit stream, since the frame lengths that can be processed when theG.711.0 encoder performs the lossless compression are 5 ms, 10 ms, 20ms, 30 ms, and 40 ms, the longest frame length in the embodiment of thepresent invention is 40 ms, and available frame lengths are 5 ms, 10 ms,20 ms, 30 ms, and 40 ms. If the packet length of the data packet is lessthan or equal to 40 ms and the packet length of the data packet is equalto an available frame length during the lossless compression, it isdetermined that the frame length during the lossless compression is thepacket length of the data packet. If the packet length of the datapacket is not equal to an available frame length during the losslesscompression, it is determined that the frame length processed currentlyis the longest available frame length less than the packet length of thedata packet to be processed. An example that a length of the data packetto be processed is 35 ms is taken for description. According tolimitation of the foregoing conditions, since 35 ms is not the availableframe length of the G.711.0 encoder, it is determined that the framelength processed currently is 30 ms. In this case, 30 ms is the longestavailable frame length in the embodiment of the present invention. Ifthe packet length of the data packet to be processed is greater than 40ms during the lossless compression, it is determined that the framelength processed currently is the longest frame length during thelossless compression, or it is determined that the frame lengthprocessed currently is the second longest frame length corresponding tothe longest frame length during the lossless compression.

Alternatively, step 203 in which the G.711.0 encoder is adopted tocompress the core layer bit stream may also be implemented in thefollowing manners.

If the packet length of the data packet is less than or equal to 40 ms,the determining the frame length information during the losslesscompression performed on the data packet is: determining that the framelength during the lossless compression is the packet length of the datapacket, or shown in the formula

${{fl}_{{1{pl}} \leq 40}({pl})} = \left\{ \begin{matrix}{{pl},{N = 1},{{{if}\mspace{14mu} {pl}} = 5},10,20,30,40} \\{{{pl} - 5},{N = 2},{{{if}\mspace{14mu} {pl}} < {40\mspace{14mu} {and}\mspace{14mu} {pl}} \neq 5},10,20,30.}\end{matrix} \right.$

If a packet length of the packed data packet is greater than 40 ms, aframe length of a first frame of the packed data packet is

${fl}_{{1{pl}} > 40} = \left\{ \begin{matrix}{30,{N = {\left\lfloor \frac{pl}{30} \right\rfloor + {{fl}_{{1{pl}} \leq 40}\left( {{pl} - {\left\lfloor \frac{pl}{30} \right\rfloor \times 30}} \right)}}}} \\{40,{N = {\left\lfloor \frac{pl}{40} \right\rfloor + {{fl}_{{1{pl}} \leq 40}\left( {{pl} - {\left\lfloor \frac{pl}{40} \right\rfloor \times 40}} \right)}}},}\end{matrix} \right.$

where └.┘ is a round-down operator; a frame length of a second frame is5 ms and a frame length of a third frame is 5 ms.

Alternatively, step 203 in which the G.711.0 encoder is adopted tocompress the core layer bit stream may also be implemented in thefollowing manner: after determining that the frame length of the firstframe during the lossless compression is fl₁, determining a framelength, which is to be processed subsequently, according to

$\quad\left\{ \begin{matrix}{{{{{when}\mspace{14mu} {pl}\mspace{14mu} {can}\mspace{14mu} {be}\mspace{14mu} {divided}\mspace{14mu} {exactly}\mspace{14mu} {by}\mspace{14mu} {fl}_{1}\text{:}\mspace{14mu} N} = \frac{pl}{{fl}_{1}}};{{fl}_{n} = {fl}_{1}}},{{n \in \left\lbrack {1,N} \right\rbrack};}} \\{\mspace{14mu} \begin{matrix}{{when}\mspace{14mu} {pl}\mspace{14mu} {cannot}\mspace{14mu} {be}\mspace{14mu} {divided}\mspace{14mu} {exactly}\mspace{14mu} {by}\mspace{14mu} {fl}_{1}\text{:}} \\{{N = \; {\left\lfloor \frac{pl}{{fl}_{1}} \right\rfloor + {{fl}_{{1{pl}}<=40}\left( \left( {{pl} - \; {\left\lfloor \frac{pl}{{fl}_{1}} \right\rfloor \bullet \; {fl}_{1}}} \right) \right)}}};}\end{matrix}} \\{{{fl}_{n} = {fl}_{1}},{{n \in \left\lbrack {1,\left\lfloor \frac{pl}{{fl}_{1}} \right\rfloor} \right\rbrack};}} \\{{{fl}_{n} = {{fl}_{{1{pl}}<=40}\left( \left( {{pl} - {\left\lfloor \frac{pl}{{fl}_{1}} \right\rfloor \bullet \; {fl}_{1}}} \right) \right)}},{{{n \in \left\lbrack {{\left\lfloor \frac{pl}{{fl}_{1}} \right\rfloor + 1},N} \right\rbrack};}.}}\end{matrix} \right.$

Moreover, the frame length information during the lossless compressionperformed on the data packet may also be determined by combining theforegoing two manners. Definitely, the foregoing three manners arespecifically described as examples only and are not intended to limitthe manner of determining the frame length information during thelossless compression in the embodiment of the present invention.

Based on the embodiments shown in FIG. 1 and FIG. 2, the packed voicecode stream may also be sent to a transmission network or a storage.

FIG. 3 is a schematic structural diagram of an embodiment of a voicesignal encoding device of the present invention. As shown in FIG. 3,this embodiment includes a first processing module 31, a secondprocessing module 32, and a third processing module 33.

The first processing module 31 encodes an input voice signal to obtain abroadband code stream, where the broadband code stream includes a corelayer bit stream and an extension enhancement layer bit stream. Thesecond processing module 32 compresses the core layer bit stream toobtain a compressed code stream. The third processing module 33 packsthe compressed code stream and the extension enhancement layer bitstream to obtain a packed code stream.

In the voice signal encoding device provided by the embodiment of thepresent invention, the second processing module 32 compresses the corelayer bit stream, and the third processing module 33 packs thecompressed code stream and the extension enhancement layer bit stream,thereby reducing transmission bandwidth occupied by the input voicesignal. Since broadband voice encoding is performed on the input voicesignal, a broadband voice code is transmitted by using narrowbandtransmission bandwidth, thereby improving the cost performance of voicesignal transmission.

FIG. 4 is a schematic structural diagram of another embodiment of avoice signal encoding device of the present invention. As shown in FIG.4, this embodiment includes a first processing module 41, a secondprocessing module 42, a third processing module 43, and a sending module44.

The first processing module 41 encodes an input voice signal to obtain abroadband code stream, where the broadband code stream includes a corelayer bit stream and an extension enhancement layer bit stream. Thesecond processing module 42 compresses the core layer bit stream toobtain a compressed code stream. The third processing module 43 packsthe compressed code stream and the extension enhancement layer bitstream to obtain a packed code stream. The sending module 44 sends thepacked voice code stream to a network or a storage.

Furthermore, the second processing module 42 may further include a firstrecombination unit 421, a first determination unit 422, a compressionunit 423, a second determination unit 424, and a third determinationunit 425. The first recombination unit 421 combines core layer bitstreams in at least two frames to obtain a combined core layer bitstream. The first determination unit 422 determines frame lengthinformation during lossless compression performed on a data packet. Thecompression unit 423 performs lossless compression on the data packet byusing the frame length information, to obtain the compressed codestream.

Furthermore, when a packet length of the data packet to be processed isless than or equal to a longest frame length during the losslesscompression, if the packet length of the data packet is equal to anavailable frame length during the lossless compression, the firstdetermination unit 422 determines that a frame length during thelossless compression is the packet length of the data packet; if thepacket length of the data packet is not equal to an available framelength during the lossless compression, the first determination unit 422determines that a frame length processed currently is a longestavailable frame length less than the packet length of the data packet tobe processed.

If a packet length of the data packet to be processed is greater than alongest frame length during the lossless compression, the firstdetermination unit 422 determines that a frame length processedcurrently is the longest frame length during the lossless compression;or determines that a frame length processed currently is a secondlongest frame length corresponding to the longest frame length duringthe lossless compression.

Furthermore, the second determination unit 424 determines a frame lengthof a first frame during the lossless compression. If the packet lengthof the data packet is an integral multiple of the frame length of thefirst frame, the second determination unit 424 determines that the framelength of the remaining frames during the lossless compression is theframe length of the first frame. When the packet length of the datapacket is not an integral multiple of the frame length of the firstframe, if the packet length of the data packet to be processed isgreater than or equal to the frame length of the first frame, the seconddetermination unit 424 determines that the frame length processedcurrently is equal to the frame length of the first frame; if the packetlength of the data packet to be processed is less than the frame lengthof the first frame, the second determination unit 424 determines thatthe frame length processed currently is the longest available framelength less than the packet length of the data packet to be processed.

Furthermore, the third determination unit 425 determines frame lengthinformation during lossless compression performed on the data packet,according to a type of a voice transmission network and/or a type of theinput voice signal.

Furthermore, the third processing module 43 may further include a secondrecombination unit 431 and an addition unit 432. The secondrecombination unit 431 recombines the compressed code stream and theextension enhancement layer bit stream to form a recombined code stream.The addition unit 432 adds header information including side informationinto the recombined code stream to obtain the packed voice code stream.

In the voice signal encoding device provided by the embodiment of thepresent invention, the second processing module 42 compresses the corelayer bit stream, and the third processing module 43 packs thecompressed code stream and the extension enhancement layer bit stream,thereby reducing transmission bandwidth occupied by the input voicesignal. Since broadband voice encoding is performed on the input voicesignal, a broadband voice code is transmitted by using narrowbandtransmission bandwidth, thereby improving the cost performance of voicesignal transmission.

FIG. 5 is a schematic flow chart of an embodiment of a voice signaldecoding method of the present invention. As shown in FIG. 5, theembodiment of the present invention includes the following steps.

Step 501: Acquire header information in a packed code stream.

Step 502: Unpack the packed code stream according to the headerinformation, to obtain an extension enhancement layer bit stream and acompressed core layer bit stream.

Step 503: Decompress the compressed core layer bit stream to obtain adecompressed code stream.

Step 504: Perform decoding reestablishment on the extension enhancementlayer bit stream and the decompressed code stream, to obtain a broadbandreestablished voice signal.

In the voice signal decoding method provided by the embodiment of thepresent invention, the packed code stream is unpacked to obtain theextension enhancement layer bit stream and the compressed core layer bitstream, and the compressed core layer bit stream is decompressed toobtain the decompressed code stream, so as to implement the effect oftransmitting broadband voice by using narrowband transmission bandwidth,thereby improving the cost performance of voice signal transmission.

FIG. 6 is a schematic flow chart of another embodiment of a voice signaldecoding method of the present invention. As shown in FIG. 6, theembodiment of the present invention includes the following steps.

Step 601: Acquire header information in a packed code stream.

The header information includes side information. In an actualapplication, the side information may include header information in theRTP, and may also include payload header information (Payload Header) inthe RTP, where the payload header information may be encoding modeinformation of a G.711.1 encoder.

Step 602: Acquire the side information included in the headerinformation.

Step 603: Unpack the packed code stream according to the sideinformation, to obtain an extension enhancement layer bit stream and acompressed core layer bit stream.

Step 604: Acquire frame length information during losslessdecompression.

In the case in which a code stream of multiple frames is packed into adata packet, a decoding end may perform decoding by using informationwhich is carried in the header information and which can be used forcalculating a packet length of the data packet and the number of framesincluded in a packet, to obtain a processing frame length at each timeof lossless decoding in the data packet.

Step 605: Perform the lossless decompression on the core layer bitstream according to the frame length information, to obtain thedecompressed code stream.

Step 606: De-recombine the extension enhancement layer bit stream andthe decompressed code stream, to obtain a broadband code stream.

Step 607: Decode the broadband code stream to obtain the broadbandreestablished voice signal.

In this embodiment, the core layer code stream may be specifically anarrowband voice code stream. The narrowband voice code stream isobtained by encoding a narrowband signal. The extension enhancementlayer bit stream may specifically include a narrowband enhancement bitstream and/or a broadband enhancement bit stream. The narrowbandenhancement bit stream may be specifically an enhancement bit streamwith narrowband voice encoding quality. The broadband enhancement bitstream may be specifically an enhancement bit stream with broadbandvoice encoding quality.

In the voice signal decoding method provided by the embodiment of thepresent invention, the packed code stream is unpacked to obtain theextension enhancement layer bit stream and the compressed core layer bitstream, and the compressed core layer bit stream is decompressed toobtain the decompressed code stream, so as to implement the effect oftransmitting broadband voice by using narrowband transmission bandwidth,thereby improving the cost performance of voice signal transmission.

FIG. 7 is a schematic structural diagram of an embodiment of a voicesignal decoding device of the present invention. As shown in FIG. 7,this embodiment includes an acquisition module 71, an unpacking module72, a decompression module 73 and a reestablishment module 74.

The acquisition module 71 acquires header information in a packed codestream. The unpacking module 72 unpacks the packed code stream accordingto the header information to obtain an extension enhancement layer bitstream and a compressed core layer bit stream. The decompression module73 decompresses the compressed core layer bit stream to obtain adecompressed code stream. The reestablishment module 74 performsdecoding reestablishment on the extension enhancement layer bit streamand the decompressed code stream, to obtain a broadband reestablishedvoice signal.

In the voice signal decoding device provided by the embodiment of thepresent invention, the unpacking module 72 unpacks the packed codestream to obtain the extension enhancement layer bit stream and thecompressed core layer bit stream, and the decompression module 73decompresses the compressed core layer bit stream to obtain thedecompressed code stream, so as to implement the effect of transmittingbroadband voice by using narrowband transmission bandwidth, therebyimproving the cost performance of voice signal transmission.

FIG. 8 is a schematic structural diagram of another embodiment of avoice signal decoding device of the present invention. As shown in FIG.8, this embodiment includes an acquisition module 81, an unpackingmodule 82, a decompression module 83 and a reestablishment module 84.

The acquisition module 81 acquires header information in a packed codestream. The unpacking module 82 unpacks the packed code stream accordingto the header information to obtain an extension enhancement layer bitstream and a compressed core layer bit stream. The decompression module83 decompresses the compressed core layer bit stream, to obtain adecompressed code stream. The reestablishment module 84 performsdecoding reestablishment on the extension enhancement layer bit streamand the decompressed code stream, obtain a broadband reestablished voicesignal.

Furthermore, the unpacking module 82 may further include a firstacquisition unit 821 and an unpacking unit 822. The first acquisitionunit 821 acquires side information included in the header information.The unpacking unit 822 unpacks the packed code stream according to theside information, to obtain the extension enhancement layer bit streamand the compressed core layer bit stream.

Furthermore, the decompression module 83 may further include a secondacquisition unit 831 and a decompression unit 832. The secondacquisition unit 831 acquires frame length information during losslessdecompression. The decompression unit 832 performs the losslessdecompression on the core layer bit stream according to the frame lengthinformation, to obtain the decompressed code stream.

Furthermore, the reestablishment module 84 may further include ade-recombination unit 841 and a decoding unit 842. The de-recombinationunit 841 de-recombines the extension enhancement layer bit stream andthe decompressed code stream, to obtain a broadband code stream. Thedecoding unit 842 decodes the broadband code stream, to obtain thebroadband reestablished voice signal.

In the voice signal decoding device provided by the embodiment of thepresent invention, the unpacking module 82 unpacks the packed codestream to obtain the extension enhancement layer bit stream and thecompressed core layer bit stream, and the decompression module 83decompresses the compressed core layer bit stream to obtain thedecompressed code stream, so as to implement the effect of transmittingbroadband voice by using narrowband transmission bandwidth, therebyimproving the cost performance of voice signal transmission.

FIG. 9 is a schematic structural diagram of an embodiment of a voicecodec system of the present invention. As shown in FIG. 9, thisembodiment includes a voice signal encoding device 91 and a voice signaldecoding device 92.

The voice signal encoding device 91 encodes an input voice signal toobtain a broadband code stream, where the broadband code stream includesa core layer bit stream and an extension enhancement layer bit stream;compresses the core layer bit stream to obtain a compressed code stream;packs the compressed code stream and the extension enhancement layer bitstream to obtain a packed code stream; and sends the packed code streamto the voice signal decoding device 92.

The voice signal decoding device 92 acquires header information from thepacked code stream sent by the voice signal encoding device 91; unpacksthe packed code stream according to the header information to obtain theextension enhancement layer bit stream and a compressed core layer bitstream; decompresses the compressed core layer bit stream, to obtain adecompressed code stream; performs decoding reestablishment on theextension enhancement layer bit stream and the decompressed code stream,to obtain a broadband reestablished voice signal.

In the voice codec system provided by the embodiment of the presentinvention, the voice signal encoding device 91 compresses the core layerbit stream, packs the compressed code stream and the extensionenhancement layer bit stream, and sends the packed voice code stream tothe voice signal decoding device 92, thereby reducing transmissionbandwidth occupied by the input voice signal. Since the broadband voiceencoding is performed on the input voice signal, a broadband voice codestream is transmitted by using narrowband transmission bandwidth,thereby improving the cost performance of voice signal transmission.

In order to illustrate the technical solution of the embodiment of thepresent invention more clearly, the codec system applicable in theembodiment of the present invention is described in detail below.

FIG. 10 is a schematic structural diagram of a system applicable in anembodiment of the present invention. As shown in FIG. 10, in thisembodiment, an encoding end includes a voice signal encoding device 11in the embodiment shown in FIG. 9, and a decoding end includes a voicesignal decoding device 12 in the embodiment shown in FIG. 9. The voicesignal encoding device 11 includes a first encoder 111, a second encoder112, a recombination module 113, and a packing module 114. The voicesignal decoding device 12 includes a first decoder 121, a second decoder122, a de-recombination module 123, and an unpacking module 124.

At the encoding end, the first encoder 111 performs broadband voiceencoding on an input voice signal to obtain a broadband voice codestream, where the broadband voice code stream includes a core layer bitstream and an extension enhancement layer bit stream; the core layer bitstream is recombined by the recombination module 113, or the core layerbit stream is directly input into the second encoder 112 for losslesscompression, to generate a losslessly compressed code stream; thepacking module 114 packs the compressed code stream and the extensionenhancement layer bit stream to obtain a packed voice code stream andtransmits the packed voice code stream to the decoding end through anetwork. Specifically, if the first encoder 111 is a G.711.1 encoder andthe second encoder 112 is a G.711.0 encoder, the core layer bit streamformed after the first encoder 111 encodes the input voice signal is aG.711 bit stream (bits), and the extension enhancement layer bit streamis a G.711.1 extension bit stream (ext bits). The G.711 bit stream(bits) is recombined by the recombination module 113 and then is inputinto the second encoder 112, and the second encoder 112 performs thelossless compression on the recombined G.711 bit stream (bits) to obtaina G.711.0 bit stream (bits). The packing module 114 packs the G.711.1extension bit stream (ext bits) and the G.711.0 bit stream (bits) andthen transmits the packed bit stream to the decoding end through thenetwork.

At the decoding end, the unpacking module 124 unpacks the received voicecode stream, to obtain the extension enhancement layer bit stream and acompressed core layer bit stream; the second decoder 122 decodes thecompressed core layer bit stream, to obtain a decompressed code stream;the de-recombination module 123 de-recombines the extension enhancementlayer bit stream and the decompressed code stream, to obtain ade-recombined code stream; the first decoder 12 performs correspondingdecoding on the de-recombined code stream to restore the voice signal.Specifically, if the first decoder 121 is a G.711.1 decoder and thesecond decoder 122 is a G.711.0 decoder, the voice code stream isunpacked by the unpacking module 124 to obtain the G.711.0 bit streamand the G.711.1 extension bit stream (ext bits). The G.711.0 bit streamis decoded by the second decoder 122 to form the G.711 bit stream (bits)and the G.711.1 extension bit stream (ext bits). The G.711 bit stream(bits) is de-recombined by the de-recombination module 123 and theninput into the first decoder 121. The first decoder 121 performslossless decoding on the de-recombined code stream and the G.711.1extension bit stream and then outputs the voice signal.

FIG. 11 is a schematic diagram of a code stream formed at the encodingend in the embodiment shown in FIG. 10. FIG. 12 is a schematic diagramof a code stream formed at the decoding end in the embodiment shown inFIG. 10. This embodiment is exemplified by taking an example that fourframes of data are packed into a data packet for illustration.Description is given in the following in combination with the embodimentshown in FIG. 10.

As shown in FIG. 11, at the encoding end, the input voice signal isencoded by the G.711.1 encoder to obtain the G.711.1 bit stream (G.711.1bits). The G.711 bit streams of every four frames are combined together.The G.711.0 encoder compresses the combined G.711 bit stream in theorder of time. Specifically, processing frame lengths of the G.711.0encoder may be 5 ms, 10 ms, 20 ms, 30 ms, and 40 ms, and the longer theframe length of the G.711.0 encoder is, the higher a compression rateis. If two, four, six, and eight G.711 bit streams of 5 ms each arecombined together according to a packet length and each are compressedby the G.711.0 encoder, corresponding frame lengths of the G.711.0encoder are 10 ms, 20 ms, 30 ms, and 40 ms respectively. In this case,according to the packet length of the data packet and the compressioncharacteristics of the G.711.0 for different frame lengths, a framelength fl₁ of a first frame can be set, and then according to the framelength fl₁ of the first frame in the order of time, the combined G.711bit stream is encoded by using the G.711.0 encoder. When a data lengthof the G.711 bit stream to be processed is less than the frame lengthfl₁ of the first frame, a longest frame length (the frame length is asecond longest frame length in this embodiment) in applicable framelengths less than the remaining packet lengths may be selected. Theprocessing is performed according to this principle until all dataprocessing ends, which may be expressed by the following formulas. Whenthe packet length of the data packet is less than or equal to 40 ms, theframe length fl₁ of the first frame is determined according to

${{fl}_{{1{pl}} \leq 40}({pl})} = \left\{ \begin{matrix}{{pl},{N = 1},{{{if}\mspace{14mu} {pl}} = 5},10,20,30,40} \\{{{pl} - 5},{N = 2},{{{if}\mspace{14mu} {pl}} < {40\mspace{14mu} {and}\mspace{14mu} {pl}} \neq 5},10,20,30,}\end{matrix} \right.$

so the frame length fl₁ of the first frame and the number N of framesduring the lossless compression performed by the G.711.0 encoder can bedetermined according to the packet length of the data packet. When thepacket length pl of the data packet is greater than 40 ms,

${fl}_{{1{pl}} > 40} = \left\{ \begin{matrix}{30,{N = {\left\lfloor \frac{pl}{30} \right\rfloor + {{fl}_{{1{pl}} \leq 40}\left( {{pl} - {\left\lfloor \frac{pl}{30} \right\rfloor \times 30}} \right)}}}} \\{40,{N = {\left\lfloor \frac{pl}{40} \right\rfloor + {{fl}_{{1{pl}} \leq 40}\left( {{pl} - {\left\lfloor \frac{pl}{40} \right\rfloor \times 40}} \right)}}},}\end{matrix} \right.$

where └.┘ is a round-down operator. In this case, whether to select 30or select 40 can be determined by calculating the number of bits savedby the whole data packet, so as to acquire a frame length with whichmore bits are saved. Moreover, the calculation may be performedaccording to actual conditions or performed through empirical valueestimate.

In the foregoing example, the frame length fl₁ of the first frame of theG.711.0 encoder is 20 ms (four frames), so that bandwidth can be savedas much as possible. If in an actual application, for example, in aconference call application system, only a narrowband G.711 bit streamis required, it is easier to truncate a code stream.

After the G.711.0 bit stream and the G.711.1 extension bit stream arecombined, header information including side information is added intothe combined code stream, and the code stream is input into atransmission network or a storage. The side information may include aG.711.1 Payload Header (including encoding mode information) and RTPpacket header information. Moreover, the side information may alsoinclude information that can be used for calculating a packet length pland the number N of frames during the lossless compression.

As shown in FIG. 12, at a receiving end, the side information isacquired from a packet header. The G.711.0 decoder decodes, based on theside information, the G.711.0 bit stream in the voice code stream, toobtain the G.711 bit stream, where the processing frame length of theG.711.0 decoder is consistent with that at the encoding end. The G.711bit stream is divided into independent 5 ms frames. The G.711 bit streamand the G.711.1 extension bit stream of each frame are combined as aG.711.1 code stream. The G.711.1 code stream is decoded by the G.711.1decoder to obtain a reestablished voice.

FIG. 13 is another schematic diagram of a code stream formed at theencoding end in the embodiment shown in FIG. 10. FIG. 14 is anotherschematic diagram of a code stream formed at the decoding end in theembodiment shown in FIG. 10. This embodiment is exemplified by taking anexample that four frames of data are packed into a data packet forillustration. Description is given in the following in combination withthe embodiment shown in FIG. 10.

As shown in FIG. 13, at the encoding end, the input voice signal isencoded by the G.711.1 encoder to obtain the G.711.1 bit stream. TheG.711 bit streams of all frames are combined together. The G.711.0encoder compresses the combined G.711 bit stream in the order of time.Processing frame lengths of the G.711.0 encoder may be 5 ms, 10 ms, 20ms, 30 ms, and 40 ms, and the longer the frame length of the G.711.0 is,the higher a compression rate is.

If two, four, six, and eight G.711 bit streams of 5 ms each are combinedtogether and each are compressed by the G.711.0 encoder, that is,corresponding frame lengths of the G.711.0 encoder are 10 ms, 20 ms, 30ms, and 40 ms, the bit stream is further compressed to reduce bandwidthoccupation. For example, a frame length less than a packet length isselected as a frame length fl₁ of a first frame, and then the combinedG.711 bit stream is encoded by the G.711.0 encoder according to theframe length fl₁ of the first frame in the order of time. When a datalength of the G.711 bit stream to be processed is less than the framelength fl₁ of the first frame, a longest frame length in frame lengthsless than the data length of the G.711 bit stream may be selected.Processing is performed according to this principle until all dataprocessing ends. For example, the frame length fl₁ of the first frame isexpressed by the following formula:

$\quad\left\{ \begin{matrix}{{{{{when}\mspace{14mu} {pl}\mspace{14mu} {can}\mspace{14mu} {be}\mspace{14mu} {divided}\mspace{14mu} {exactly}\mspace{14mu} {by}\mspace{14mu} {fl}_{1}\text{:}\mspace{14mu} N} = \frac{pl}{{fl}_{1}}};{{fl}_{n} = {fl}_{1}}},{{n \in \left\lbrack {1,N} \right\rbrack};}} \\{\mspace{14mu} \begin{matrix}{{when}\mspace{14mu} {pl}\mspace{14mu} {cannot}\mspace{14mu} {be}\mspace{14mu} {divided}\mspace{14mu} {exactly}\mspace{14mu} {by}\mspace{14mu} {fl}_{1}\text{:}} \\{{N = \; {\left\lfloor \frac{pl}{{fl}_{1}} \right\rfloor + {{fl}_{{1{pl}}<=40}\left( \left( {{pl} - \; {\left\lfloor \frac{pl}{{fl}_{1}} \right\rfloor \bullet \; {fl}_{1}}} \right) \right)}}};}\end{matrix}} \\{{{fl}_{n} = {fl}_{1}},{{n \in \left\lbrack {1,\left\lfloor \frac{pl}{{fl}_{1}} \right\rfloor} \right\rbrack};}} \\{{{fl}_{n} = {{fl}_{{1{pl}}<=40}\left( \left( {{pl} - {\left\lfloor \frac{pl}{{fl}_{1}} \right\rfloor \bullet \; {fl}_{1}}} \right) \right)}},{{{n \in \left\lbrack {{\left\lfloor \frac{pl}{{fl}_{1}} \right\rfloor + 1},N} \right\rbrack};}.}}\end{matrix} \right.$

In this processing manner, bandwidth can be saved as much as possible,and moreover, if in an application, for example, in a conference callapplication system, only a narrowband G.711 bit stream is required, itis easier to truncate a code stream.

When the G.711 bit streams of 5 ms are combined together and each arecompressed by the G.711.0 encoder at a frame length of 5 ms, if a biterror exists, more valid voice packets can be decoded. If in an actualapplication, for example, in the conference call application system,only a G.711 bit stream is required, it is easier to truncate a codestream.

The frame length to be processed by the G.711.0 encoder may also beadaptively determined according to a type of a voice transmissionnetwork or a type of the input voice signal. For example, the packetlength of the data packet corresponding to the core layer bit streamscombined by the combination module is 20 ms, where in a first 10 ms, thedata is a silence signal, and in a second 10 ms, the data is a voicesignal. In this case, compression is separately performed on the first10 ms data, which can obtain larger compression efficiency. Therefore,the G.711.0 encoder can use a frame length of 10 ms, so two data framescorresponding to the G.711.0 encoder are formed. Moreover, if a lot ofbit errors exist in network transmission, a short frame is adopted asmuch as possible; otherwise, a long frame is adopted as much aspossible. If a signal type is silence, the long frame may be adopted,and if the signal type is a voice, the short frame may be adopted.

After the G.711.0 bit stream and the G.711.1 extension bit stream arecombined, a header including side information is added into the combinedcode stream, and the code stream is input into a transmission network.The side information may include the Payload Header (including encodingmode information) of a G.711.1 encoder and RTP packet headerinformation. Moreover, the side information may also include informationthat can be used for calculating a packet length pl, the number N offrames during the lossless compression and so on.

As shown in FIG. 14, at the receiving end, the side information isacquired from the header information, where the side information mayinclude G.711.1 encoding mode information, RTP header information, orG.711.0 encoding frame length information, and the side information mayalso be acquired from the SDP. The G.711.0 encoding frame lengthinformation is obtained based on the side information, and the G.711.0decoder decodes the G.711.0 bit stream in the input code stream, toobtain the G.711 bit stream. The processing frame length of the G.711.0decoder is consistent with that at the encoding end. The G.711 bitstream is divided into independent 5 ms frames. The G.711 bit stream andthe G.711.1 extension bit stream of each frame are combined as theG.711.1 code stream. The G.711.1 code stream is decoded by the G.711.1decoder to obtain a reestablished voice.

FIG. 15 is yet another schematic diagram of a code stream formed at theencoding end in the embodiment shown in FIG. 10. FIG. 16 is yet anotherschematic diagram of a code stream formed at the decoding end in theembodiment shown in FIG. 10. This embodiment is exemplified by taking anexample that four frames of data are packed into a data packet forillustration. Description is given in the following in combination withthe embodiment shown in FIG. 10.

As shown in FIG. 15, at the encoding end, the input voice signal isencoded by the G.711.1 encoder to obtain the G.711.1 code stream. TheG.711.0 encoder compresses the G.711 bit streams in the G.711.1 codestream at the frame length of 5 ms in the order of time. The G.711.0 bitstream and the G.711.1 extension bit stream are combined and packed intoa data packet. Header information including side information is addedinto the data packet, and the data packet is input into a transmissionnetwork. The side information may include a G.711.1 Payload Header(including encoding mode information), and RTP packet headerinformation. Moreover, the side information may also include informationthat can be used for calculating a packet length pl, the number N offrames during lossless compression, and so on.

As shown in FIG. 16, at the decoding end, the side information isacquired from the header information. The G.711.0 decoder decodes, basedon the side information, the G.711.0 bit stream in the input codestream, to obtain the G.711 bit stream of one frame. The processingframe length of the G.711.0 decoder is 5 ms. The G.711 bit stream andthe G.711.1 extension bit stream of one frame are combined as a G.711.1code stream of one frame. The G.711.1 code stream is decoded by theG.711.1 decoder to obtain a reestablished voice. Moreover, thede-combination and decoding process may also be repeated until the datapacket is null.

The embodiment shown in FIG. 10 to FIG. 16 is specifically applied tothe G.711.0 encoder and the G.711.1 encoder and can make full use of theadvantages of the G.711.0 encoder and the G.711.1 encoder. Through avoice encoding transmission solution with the combination of the G.711.0encoder and the G.711.1 encoder, bandwidth is saved and broadband voicequality is provided, thereby improving the cost performance of voicetransmission of a network system, and meanwhile taking account ofalgorithm complexity in the design of the solution.

Furthermore, the embodiment shown in FIG. 10 to FIG. 16 in which thefirst encoder is specifically the G.711.1 encoder and the second encoderis specifically the G.711.0 encoder is exemplified, but implementationof the embodiments of the present invention is not limited to theforegoing situation that the first decoder is specifically the G.711.1decoder and the second decoder is specifically the G.711.0 decoder. Aslong as functions of the codec of the technical solutions described inthe embodiments of the present invention is implemented through acorresponding codec, the corresponding codec belongs to the technicalsolutions described in the embodiments of the present invention.

It can be clearly understood by persons skilled in the art that, for thepurpose of convenient and brief description, for the detailed workingprocess of the foregoing system, apparatus, module, and unit, referencemay be made to the corresponding process in the method embodiments, anddetails are not described herein again.

Persons of ordinary skill in the art should understand that all or apart of the steps in the embodiments may be implemented by a programinstructing relevant hardware. The program may be stored in a computerreadable storage medium. When the program runs, the steps of the methodaccording to the embodiments are performed. The storage medium includesvarious media capable of storing program codes, such as a ROM, a RAN, amagnetic disk, or an optical disk.

Finally, it should be noted that the foregoing embodiments are merelyprovided for describing the technical solutions of the presentinvention, but not intended to limit the present invention. It should beunderstood by persons of ordinary skill in the art that though thepresent invention has been described in detail with reference to theexemplary embodiments, modifications or equivalent replacements can bemade to the technical solutions of the present invention, as long as themodifications or equivalent replacements cannot make the modifiedtechnical solutions depart from the idea and scope of the technicalsolutions of the present invention.

What is claimed is:
 1. A voice signal encoding method, comprising:encoding an input voice signal to obtain a broadband code stream,wherein the broadband code stream comprises a core layer bit stream andan extension enhancement layer bit stream; compressing the core layerbit stream to obtain a compressed code stream; and packing thecompressed code stream and the extension enhancement layer bit stream toobtain a packed code stream.
 2. The method according to claim 1, whereinthe compressing the core layer bit stream to obtain the compressed codestream comprises: combining core layer bit streams in at least two dataframes to obtain a data packet corresponding to a combined core layerbit stream; determining frame length information during losslesscompression performed on the data packet; and performing the losslesscompression on the data packet according to the frame lengthinformation, to obtain the compressed code stream.
 3. The methodaccording to claim 2, wherein the determining the frame lengthinformation during the lossless compression performed on the data packetcomprises: determining the frame length information during the losslesscompression performed on the data packet, according to a compressioncharacteristic corresponding to each frame length during the losslesscompression and a packet length of the data packet.
 4. The methodaccording to claim 2, wherein if a packet length of a data packet to beprocessed is less than or equal to a longest frame length during thelossless compression, the determining the frame length informationduring the lossless compression performed on the data packet is: if thepacket length of the data packet is equal to an available frame lengthduring the lossless compression, determining frame length informationduring lossless compression performed on the data packet; if the packetlength of the data packet is not equal to an available frame lengthduring the lossless compression, determining that a frame lengthprocessed currently is a longest available frame length less than thepacket length of the data packet to be processed; if a packet length ofa data packet to be processed is greater than a longest frame lengthduring the lossless compression, the determining the frame lengthinformation during the lossless compression performed on the data packetis: determining that a frame length processed currently is the longestframe length during the lossless compression; or determining that aframe length processed currently is a second longest frame lengthcorresponding to the longest frame length during the losslesscompression.
 5. The method according to claim 2, wherein the determiningthe frame length information during the lossless compression performedon the data packet comprises: determining a frame length of a firstframe during the lossless compression; if a packet length of the datapacket is an integral multiple of the frame length of the first frame,determining that a frame length of remaining frames during the losslesscompression is the frame length of the first frame; when a packet lengthof the data packet is not an integral multiple of the frame length ofthe first frame, if the packet length of the data packet to be processedis greater than or equal to the frame length of the first frame,determining that a frame length processed currently is equal to theframe length of the first frame; if the packet length of the data packetto be processed is less than the frame length of the first frame,determining that a frame length processed currently is a longestavailable frame length less than the packet length of the data packet tobe processed.
 6. A voice signal encoding device, comprising: a firstprocessing module, configured to encode an input voice signal to obtaina broadband code stream, wherein the broadband code stream comprises acore layer bit stream and an extension enhancement layer bit stream; asecond processing module, configured to compress the core layer bitstream to obtain a compressed code stream; and a third processingmodule, configured to pack the compressed code stream and the extensionenhancement layer bit stream to obtain a packed code stream.
 7. Thedevice according to claim 6, wherein the second processing modulecomprises: a first recombination unit, configured to combine core layerbit streams in at least two data frames to obtain a combined core layerbit stream; a first determination unit, configured to determine framelength information during lossless compression performed on a datapacket; and a compression unit, configured to perform the losslesscompression on the data packet according to the frame lengthinformation, to obtain the compressed code stream.
 8. The deviceaccording to claim 7, wherein when the first determination unitdetermines the frame length information during the lossless compressionperformed on the data packet comprises: when a packet length of the datapacket to be processed is less than or equal to a longest frame lengthduring the lossless compression, if the packet length of the data packetto be processed is equal to an available frame length during thelossless compression, the first determination unit is configured todetermine that a frame length during the lossless compression is thepacket length of the data packet; if the packet length of the datapacket to be processed is not equal to an available frame length duringthe lossless compression, the first determination unit is configured todetermine that a frame length processed currently is a longest availableframe length less than the packet length of the data packet to beprocessed; if a packet length of the data packet to be processed isgreater than a longest frame length during the lossless compression, thefirst determination unit is configured to determine that a frame lengthprocessed currently is the longest frame length during the losslesscompression; or determine that a frame length processed currently is asecond longest frame length corresponding to the longest frame lengthduring the lossless compression.
 9. The device according to claim 6,wherein the second processing module comprises: a second determinationunit, configured to determine frame length information of a first frameduring lossless compression; wherein if a packet length of the datapacket is an integral multiple of a frame length of the first frame, thesecond determination unit is configured to determine that a frame lengthof remaining frames during the lossless compression is the frame lengthof the first frame; when a packet length of the data packet to beprocessed is not an integral multiple of a frame length of the firstframe, if the packet length of the data packet to be processed isgreater than or equal to the frame length of the first frame, the seconddetermination unit is configured to determine that a frame lengthprocessed currently is equal to the frame length of the first frame; ifthe packet length of the data packet to be processed is less than theframe length of the first frame, the second determination unit isconfigured to determine that a frame length processed currently is alongest available frame length less than the packet length of the datapacket to be processed.
 10. The device according to claim 7, wherein thesecond processing module comprises a third determination unit, and thethird determination unit is configured to determine the frame lengthinformation during the lossless compression performed on the datapacket, according to a type of a voice transmission network or a type ofthe input voice signal.
 11. The device according to any one of claim 7,wherein the third processing module comprises: a second recombinationunit, configured to recombine the compressed code stream and theextension enhancement layer bit stream to form a recombined code stream;and an addition unit, configured to add header information comprisingside information into the recombined code stream, to obtain the packedcode stream.
 12. A voice signal decoding method, comprising: acquiringheader information in a packed code stream; unpacking the packed codestream according to the header information, to obtain an extensionenhancement layer bit stream and a compressed core layer bit stream;decompressing the compressed core layer bit stream to obtain adecompressed code stream; and performing decoding reestablishment on theextension enhancement layer bit stream and the decompressed code stream,to obtain a broadband reestablished voice signal.
 13. The methodaccording to claim 12, wherein the unpacking the packed code streamaccording to the header information to obtain the extension enhancementlayer bit stream and the compressed core layer bit stream comprises:acquiring side information comprised in the header information; andunpacking the packed code stream according to the side information, toobtain the extension enhancement layer bit stream and the compressedcore layer bit stream.
 14. The method according to claim 12, wherein thedecompressing the compressed core layer bit stream to obtain thedecompressed code stream comprises: acquiring frame length informationduring lossless decompression; and performing the lossless decompressionon the core layer bit stream according to the frame length information,to obtain the decompressed code stream.
 15. A voice signal decodingdevice, comprising: an acquisition module, configured to acquire headerinformation in a packed code stream; an unpacking module, configured tounpack the packed code stream according to the header information, toobtain an extension enhancement layer bit stream and a compressed corelayer bit stream; a decompression module, configured to decompress thecompressed core layer bit stream to obtain a decompressed code stream;and a reestablishment module, configured to perform decodingreestablishment on the extension enhancement layer bit stream and thedecompressed code stream, to obtain a broadband reestablished voicesignal.
 16. The device according to claim 15, wherein the unpackingmodule comprises: a first acquisition unit, configured to acquire sideinformation comprised in the header information; and an unpacking unit,configured to unpack the packed code stream according to the sideinformation, to obtain the extension enhancement layer bit stream andthe compressed core layer bit stream.
 17. The device according to claim15, wherein the decompression module comprises: a second acquisitionunit, configured to acquire frame length information of a first frameduring lossless decompression; and a decompression unit, configured toperform the lossless decompression on the core layer bit streamaccording to the frame length information, to obtain the decompressedcode stream.